Assessment & Standards
- Information Technology > Security & Privacy (0.48)
- Education > Assessment & Standards (0.46)
Viome Full Body Intelligence Test Review: Little Clarity, Pricey Supplements
Virtually every aspect of your health can be traced back to your microbiome. But some tests are better than others. Some of the recipes look tasty. I admit it: I'm a sucker for metrics. Fitness trackers that keep tabs on my steps and sleep? A DEXA scan to give me too much information about my body composition?
- North America > United States > New York (0.04)
- North America > United States > California (0.04)
- Europe > Slovakia (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Consumer Health (1.00)
- Education > Assessment & Standards > Measuring Intelligence (0.44)
Covariate-assisted Grade of Membership Models via Shared Latent Geometry
The grade of membership model is a flexible latent variable model for analyzing multivariate categorical data through individual-level mixed membership scores. In many modern applications, auxiliary covariates are collected alongside responses and encode information about the same latent structure. Traditional approaches to incorporating such covariates typically rely on fully specified joint likelihoods, which are computationally intensive and sensitive to misspecification. We introduce a covariate-assisted grade of membership model that integrates response and covariate information by exploiting their shared low-rank simplex geometry, rather than modeling their joint distribution. We propose a likelihood-free spectral estimation procedure that combines heterogeneous data sources through a balance parameter controlling their relative contribution. To accommodate high-dimensional and heteroskedastic noise, we employ heteroskedastic principal component analysis before performing simplex-based geometric recovery. Our theoretical analysis establishes weaker identifiability conditions than those required in the covariate-free model, and further derives finite-sample, entrywise error bounds for both mixed membership scores and item parameters. These results demonstrate that auxiliary covariates can provably improve latent structure recovery, yielding faster convergence rates in high-dimensional regimes. Simulation studies and an application to educational assessment data illustrate the computational efficiency, statistical accuracy, and interpretability gains of the proposed method. The code for reproducing these results is open-source and available at \texttt{https://github.com/Toby-X/Covariate-Assisted-GoM}
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.87)
An Autoencoder-Like Nonnegative Matrix Co-Factorization for Improved Student Cognitive Modeling
Student cognitive modeling (SCM) is a fundamental task in intelligent education, with applications ranging from personalized learning to educational resource allocation. By exploiting students' response logs, SCM aims to predict their exercise performance as well as estimate knowledge proficiency in a subject. Data mining approaches such as matrix factorization can obtain high accuracy in predicting student performance on exercises, but the knowledge proficiency is unknown or poorly estimated. The situation is further exacerbated if only sparse interactions exist between exercises and students (or knowledge concepts). To solve this dilemma, we root monotonicity (a fundamental psychometric theory on educational assessments) in a co-factorization framework and present an autoencoder-like nonnegative matrix co-factorization (AE-NMCF), which improves the accuracy of estimating the student's knowledge proficiency via an encoder-decoder learning pipeline. The resulting estimation problem is nonconvex with nonnegative constraints. We introduce a projected gradient method based on block coordinate descent with Lipschitz constants and guarantee the method's theoretical convergence. Experiments on several real-world data sets demonstrate the efficacy of our approach in terms of both performance prediction accuracy and knowledge estimation ability, when compared with existing student cognitive models.
Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization
A hallmark property of explainable AI models is the ability to teach other agents, communicating knowledge of how to perform a task. While Large Language Models (LLMs) perform complex reasoning by generating explanations for their predictions, it is unclear whether they also make good teachers for weaker agents. To address this, we consider a student-teacher framework between two LLM agents and study if, when, and how the teacher should intervene with natural language explanations to improve the student's performance. Since communication is expensive, we define a budget such that the teacher only communicates explanations for a fraction of the data, after which the student should perform well on its own. We decompose the teaching problem along four axes: (1) if teacher's test time intervention improve student predictions, (2) when it is worth explaining a data point, (3) how the teacher should personalize explanations to better teach the student, and (4) if teacher explanations also improve student performance on future unexplained data.
Human or AI? Comparing Design Thinking Assessments by Teaching Assistants and Bots
Khan, Sumbul, Liow, Wei Ting, Ang, Lay Kee
ORCID: 0000 -0003-2811-1194 Abstract --As design thinking education is growing in secondary and tertiary education, educators face a mounting challenge of evaluating creative artefacts that comprise visual and textual elements. Traditional, rubric-based methods of assessment are laborious, time-consuming, and inconsistent, due to their reliance on Teaching Assistants (TAs) in large, multi - section cohorts. This paper presents an exploratory study to investigate the reliability and perceived accuracy of AI -assisted assessment vis -à -vis TA-assisted assessment in evaluating student posters in design thinking education. Two activities were conducted with 33 Ministry of Education (MOE), Singapore school teachers, with the objective (1) to compare AI -generated scores with TA grading across three key dimensions: empathy and user understanding, identification of pain points and opportunities, and visual communication, and (2) to understand teacher preferences for AI-assigned, TA-assigned, and hybrid scores. Results showed low statistical agreement between instructor and AI scores for empathy and pain points, though slightly higher alignment for visual communication. Teachers generally preferred TA -assigned scores in six of ten samples. Qualitative feedback highlighted AI's potential for formative feedback, consistency, and student self -reflection, but raised concerns about its limitations in capturing contextual nuance and creative insight. The study underscores the need for hybrid assessment models that integrate computational efficiency with human insights . This research contributes to the evolving conversation around responsible AI adoption in creative disciplines, emphasizing the balance between automation and human judgment for scalable and pedagogically sound assessment practices. Design thinking is a human-centered approach to innovation that draws from the designer's toolkit to integrate the needs of people, the possibilities of technology, and the requirements for business success. It is a non - linear, iterative process that teams use to understand users, challenge assumptions, redefine problems, and create innovative solutions to prototype and test.
- Asia > Singapore (0.27)
- Europe > Netherlands > South Holland > Delft (0.04)
- Education > Educational Setting > Higher Education (1.00)
- Education > Assessment & Standards (1.00)
- Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.68)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.86)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models
Zbeeb, Mohammad, Hammoud, Hasan Abed Al Kader, Mukalled, Sina, Rizk, Nadine, Karnib, Fatima, Lakkis, Issam, Mohanna, Ammar, Ghanem, Bernard
The benchmark spans five core categories: grammar, morphology, spelling, reading comprehension, and syntax, through 150 expert-designed multiple choice questions that directly assess structural language understanding. Evaluating 35 Arabic and bilingual LLMs reveals that current models demonstrate strong surface level proficiency but struggle with deeper grammatical and syntactic reasoning. AraLingBench highlights a persistent gap between high scores on knowledge-based benchmarks and true linguistic mastery, showing that many models succeed through memorization or pattern recognition rather than authentic comprehension. By isolating and measuring fundamental linguistic skills, AraLingBench provides a diagnostic framework for developing Arabic LLMs. The full evaluation code is publicly available on GitHub.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.05)
- (10 more...)
- Research Report (0.51)
- Questionnaire & Opinion Survey (0.34)
Automatic Essay Scoring and Feedback Generation in Basque Language Learning
Azurmendi, Ekhi, Arregi, Xabier, de Lacalle, Oier Lopez
This paper introduces the first publicly available dataset for Automatic Essay Scoring (AES) and feedback generation in Basque, targeting the CEFR C1 proficiency level. The dataset comprises 3,200 essays from HABE, each annotated by expert evaluators with criterion specific scores covering correctness, richness, coherence, cohesion, and task alignment enriched with detailed feedback and error examples. We fine-tune open-source models, including RoBERTa-EusCrawl and Latxa 8B/70B, for both scoring and explanation generation. Our experiments show that encoder models remain highly reliable for AES, while supervised fine-tuning (SFT) of Latxa significantly enhances performance, surpassing state-of-the-art (SoTA) closed-source systems such as GPT-5 and Claude Sonnet 4.5 in scoring consistency and feedback quality. We also propose a novel evaluation methodology for assessing feedback generation, combining automatic consistency metrics with expert-based validation of extracted learner errors. Results demonstrate that the fine-tuned Latxa model produces criterion-aligned, pedagogically meaningful feedback and identifies a wider range of error types than proprietary models. This resource and benchmark establish a foundation for transparent, reproducible, and educationally grounded NLP research in low-resource languages such as Basque.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Spain > Basque Country (0.04)
- Europe > Faroe Islands > Streymoy > Tórshavn (0.04)
- Education > Assessment & Standards > Student Performance (0.73)
- Education > Curriculum > Subject-Specific Education (0.50)
Uncovering Students' Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern Mining
Wei, Jiameng, Dang, Dinh, Yang, Kaixun, Stokes, Emily, Mazeh, Amna, Lim, Angelina, Dai, David Wei, Moore, Joel, Fan, Yizhou, Gasevic, Danijela, Gasevic, Dragan, Chen, Guanliang
Assessment of medication history-taking has traditionally relied on human observation, limiting scalability and detailed performance data. While Generative AI (GenAI) platforms enable extensive data collection and learning analytics provide powerful methods for analyzing educational traces, these approaches remain largely underexplored in pharmacy clinical training. This study addresses this gap by applying learning analytics to understand how students develop clinical communication competencies with GenAI-powered virtual patients -- a crucial endeavor given the diversity of student cohorts, varying language backgrounds, and the limited opportunities for individualized feedback in traditional training settings. We analyzed 323 students' interaction logs across Australian and Malaysian institutions, comprising 50,871 coded utterances from 1,487 student-GenAI dialogues. Combining Epistemic Network Analysis to model inquiry co-occurrences with Sequential Pattern Mining to capture temporal sequences, we found that high performers demonstrated strategic deployment of information recognition behaviors. Specifically, high performers centered inquiry on recognizing clinically relevant information, integrating rapport-building and structural organization, while low performers remained in routine question-verification loops. Demographic factors including first-language background, prior pharmacy work experience, and institutional context, also shaped distinct inquiry patterns. These findings reveal inquiry patterns that may indicate clinical reasoning development in GenAI-assisted contexts, providing methodological insights for health professions education assessment and informing adaptive GenAI system design that supports diverse learning pathways.
- Oceania > Australia (0.06)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Asia > Malaysia (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material (1.00)
- Health & Medicine > Health Care Providers & Services (0.91)
- Education > Assessment & Standards > Student Performance (0.34)
AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy
Agarwal, Shyam, Moghimi, Ali, Haudek, Kevin C.
Constructed-response questions are crucial to encourage generative processing and test a learner's understanding of core concepts. However, the limited availability of instructor time, large class sizes, and other resource constraints pose significant challenges in providing timely and detailed evaluation, which is crucial for a holistic educational experience. In addition, providing timely and frequent assessments is challenging since manual grading is labor intensive, and automated grading is complex to generalize to every possible response scenario. This paper proposes a novel and practical approach to grade short-answer constructed-response questions. We discuss why this problem is challenging, define the nature of questions on which our method works, and finally propose a framework that instructors can use to evaluate their students' open-responses, utilizing near-domain data like data from similar questions administered in previous years. The proposed method outperforms the state of the art machine learning models as well as non-fine-tuned large language models like GPT 3.5, GPT 4, and GPT 4o by a considerable margin of over 10-20% in some cases, even after providing the LLMs with reference/model answers. Our framework does not require pre-written grading rubrics and is designed explicitly with practical classroom settings in mind. Our results also reveal exciting insights about learning from near-domain data, including what we term as accuracy and data advantages using human-labeled data, and we believe this is the first work to formalize the problem of automated short answer grading based on the near-domain data.
- North America > United States > Michigan (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Instructional Material (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education > Curriculum (1.00)
- Education > Educational Setting > Online (0.93)
- (3 more...)